System and method for a multi-packet data link layer data transmission

ABSTRACT

A kernel data transfer method and system for transmitting multiple packets of data in a single block of data presented by application programs to the kernel&#39;s network subsystem for processing in accordance with data transfer parameters set by the application program. The multi-packet transmit system includes logic that allows header information of the multiple packets of data to be generated in a single buffer and appended to a second buffer containing the data packets to be transmitted through the network stack. The multi-data transmit system allows a device driver to amortize the input/output memory management related overhead across a number of packets. With some assistance from the network stack, the device driver needs to only perform the necessary IOMMU operations on two contiguous memory blocks representing the header information and the data payload of multiple packets during each transmit call.

CROSS REFERENCE TO RELATED APPLICATION

[0001] This is a continuation-in-part of Masputra et al., U.S. patentapplication Ser. No.: 10/170,919, filed Jun. 12, 2002, attorney docketNo.: SUN-P7826, entitled “A SYSTEM AND METHOD FOR A MULTI-DATA NETWORKLAYER TRANSMIT INTERFACE”. To the extent not repeated herein, thecontents of Masputra et al., are incorporated herein by reference.

FIELD OF THE INVENTION

[0002] The present claimed invention relates generally to the field ofcomputer operating systems. More particularly, embodiments of thepresent claimed invention relate to a system and method for amulti-packet data link layer data transmission.

BACKGROUND ART

[0003] A computer system can be generally divided into four components:the hardware, the operating system, the application programs and theusers. The hardware (e.g., central processing unit (CPU), memory andinput/output (I/O) devices) provides the basic computing resources. Theapplication programs (e.g., database systems, games, business programs,etc.) define the ways in which these resources are used to solvecomputing problems. The operating system controls and coordinates theuse of the hardware among the various application programs for thevarious users. In so doing, one goal of the operating system is to makethe computer system convenient to use. A secondary goal is toefficiently make use of the hardware.

[0004] The Unix operating system (Unix) is currently used by manyenterprise computer systems. Unix was designed to be a simpletime-sharing system, with a hierarchical file system, which supportsmultiple processes. A process is the execution of a program and consistsof a pattern of bytes that the CPU interprets as machine instructions ordata.

[0005] Unix consists of two separable parts which include the “kernel”and “system programs.” Systems programs typically consist of systemlibraries, compilers, interpreters, shells and other such programs whichprovide useful functions to the user. The kernel is the centralcontrolling program that provides basic system facilities. For example,the Unix kernel creates and manages processes, provides functions toaccess file-systems, and supplies communications facilities.

[0006] The Unix kernel is the only part of the Unix operating systemthat a user cannot replace. The kernel also provides the file system,CPU scheduling, memory management and other operating-system functionsby responding to “system-calls.” Conceptually, the kernel is situatedbetween the hardware and the users. System calls are the means for theprogrammer to communicate with the kernel.

[0007] System calls are made by a “trap” to a specific location in thecomputer hardware (sometimes called an “interrupt” location or vector).Specific parameters are passed to the kernel on the stack and the kernelreturns with a code in specific registers indicating whether the actionrequired by the system call was completed successfully or not.

[0008]FIG. 1 is a block diagram illustration of a prior art computersystem 100. The computer system 100 is connected to an external storagedevice 180 and to an network interface device 120 through which computerprograms can be loaded into computer system 100. External storage device180 and network interface device 120 are connected to the computersystem 100 through respective bus lines. Computer system 100 furtherincludes main memory 130 and processor 110. Device 120 can be a computerprogram product reader such a floppy disk drive, an optical scanner, aCD-ROM device, etc.

[0009]FIG. 1 additionally shows memory 130 including a kernel levelmemory 140. Memory 130 can be virtual memory which is mapped ontophysical memory including RAM or a hard drive, for example. Duringprocess execution, a programmer programs data structures in the memoryat the kernel level memory 140.

[0010] The kernel in FIG. 1 comprises a network subsystem. The networksubsystem provides a framework within which many network architecturesmay co-exist. A network architecture comprises a set ofnetwork-communication protocols, the protocol from naming conventionsfor naming communication end-points, etc.

[0011] The kernel network subsystem 140 comprises three logical layersas illustrated in FIG. 2. These three layers manage the following tasksin the kernel: inter-process data transport; internetworking addressing;and message routing and transmission media support. The prior art kernelnetwork subsystem 200 shown in FIG. 2 comprises a transport layer 220, anetworking layer 230, and a data link layer 240. The transport layer 220is the top-most layer in the network subsystem 200.

[0012] The transport layer 220 provides an addressing structure thatpermits communication between network sockets and any protocol mechanismnecessary for socket sematics, such as reliable data delivery. Thesecond layer is the network layer 230. The network layer 230 isresponsible for the delivery of data destined for remote transport ornetwork layer protocols. In providing inter-network delivery, thenetwork layer 230 manages a private routing database or utilizessystem-wide facilities for routing messages to their destination host.

[0013] The lowest layer in the network subsystem is the networkinterface layer 240. The data link layer 240 is responsible fortransporting messages between hosts connected to a common transmissionmedium. The data link layer 240 is mainly concerned with driving thetransmission media involved and performing any necessary link-levelprotocol encapsulation and de-encapsulation.

[0014]FIG. 3 is a block diagram of a prior art internet protocol (IP)for the network subsystem 200. Although FIG. 3 describes a IP networksubsystem, FIG. 3 is equally applicable to other network protocols, suchas Netbios, Appletalk, IPX/SPX, etc. The Internet protocol in FIG. 3provides a framework in which host machines connecting to the kernel 140are connected to the network with varying characteristics and thenetwork interconnected with gateways. The Internet protocol illustratedin FIG. 3 is designed for packet switching networks which providereliable message delivery and notification of failure to pure datagramnetworks, such as the Ethernet that provides no indication of datagramdelivery.

[0015] The IP layer 300 is the level responsible for host to hostaddressing and routing packet forwarding and packet fragmentation andre-assemble. Unlike the transport protocols, it does not always operateon behalf of a socket or the local links. It may forward packets,receive packets for which there are no local socket, or generate errorpackets in response. The function performed by the IP layer 300 arecontained in the packet header. The packet header identifies source anddestination hosts and the destination protocol.

[0016] The IP layer 300 processes data packets in one of four ways: 1)the packet is passed as input to a higher-level protocol; 2) the packetencounters an error which is reported back to the source; 3) the packetis dropped because of an error or the packet is forwarded along a pathto its destination.

[0017] The IP layer 300 further processes any IP options in the header,checks packets by verifying that the packet is at least as long as an IPheader, checksums the header and discards the packet if there is anerror, verifies that the packet is at least as long as the header andchecks whether the packet is for the targeted host. If the packet isfragmented, the IP layer 300 keeps it until all its fragments arereceived and reassembled or until it is too old to keep.

[0018] The major protocol of the Internet protocol suite is the TCPlayer 310. The TCP layer 310 is a reliable-connection oriented streamtransport protocol on which most application protocols are based. Itincludes several features not found in the other transport and networkprotocols for explicit and acknowledged connection initiation andtermination and includes reliable, in order unduplicated delivery ofdata, flow control and out-of band indication of urgent data.

[0019] The data may typically be sent in packets of small sizes and atvarying intervals; for example, when they are used to support a loginsession over the network. The stream initiation and termination areexplicit events after the start and end of the stream, and they occupypositions in a separate space of the stream so that they can beacknowledged in the same manner as the data.

[0020] A TCP packet contains an acknowledgement and a window field aswell as data, and a single packet may be sent if any of these threechanges. A naïve TCP send might send more packets than necessary. Forexample, consider what happens when a user types one character to aremote-terminal connection that uses remote echo. The server side TCPreceives a single-character packet. It might send an immediateacknowledgement of the character. Then milliseconds later, the loginserver would read the character, removing it from the receive buffer.The TCP might immediately send a window update notice that oneadditional octet of send window is available. After another millisecondor so, the login server would send an echo character of input.

[0021] All three responses (the acknowledgement, the window updates andthe data returns) could be sent in a single packet. However, if theserver were not echoing input data, the acknowledgement cannot bewithheld for too long a time, or the client-side TCP would begin toretransmit.

[0022] In the network subsystem illustrated in FIGS. 1-3, the underlyingoperating system has limited capabilities for handling bulk-datatransfer. For many years, there has been an attempt in formulating thenetwork throughput to directly correlate to the underlying host CPUspeed, i.e., 1 megabit (Mbps) network throughput per 1 megahertz (MHz)of CPU speed. Although such paradigms may have been sufficient in thepast for low bandwidth network environment, they may not be adequate fortoday's high-speed networking mediums, where bandwidths specified inunits of gigabit per second (Gbps) are becoming increasingly common andcreate a tremendous overhead processing cost for the underlying networksoftware.

[0023] Networking software overhead can be classified into per-byte andper-packet costs. Prior analysis of per-byte data movement cost in priorart operating system networking stacks show that data copy function andchecksum overhead function dominate host CPU processing time. Otheranalysis of the per-packet cost has revealed that the overheadassociated with some prior art operating systems is as significant asthe per-byte costs.

[0024] In analyzing the prior overhead costs of processing andtransmitting data in the kernel's network subsystem, FIG. 4 is a priorart illustration of a kernel network subsystem 400 having a stream headmodule 420 for generating network data for transmission in the networksubsystem 400. The header module 420 is the end of the stream nearestthe user process. All system calls made by user-level applications on astream are processed by the header module 420. The stream head module420 typically copies the application data from user buffers into kernelbuffers, and during the copying process, it may provide the data intosmall chunks, based on the header and data payload. The stream headmodule 420 may also reserve some extra space in front of each allocatedkernel buffer depending on the static packet value.

[0025] Currently, the TCP module 430 utilizes these parameters in anattempt to optimize the transmit dynamics and reduce allocation cost forthe TCP/IP and link-layer headers in the kernel. By setting the datapacket to a size large enough to hold the headers while setting the datato a maximum TCP segment size, the TCP module 430 effectively instructsthe stream head module 420 to divide the application data into twokernel buffers for every system call to the TCP module 430 to transmit asingle data packet.

[0026] For applications which transmit bulk data, it is not uncommon tosee buffer sizes in the range of 32 KB, 64 KB, or larger. Applicationstypically inform the TCP module 430/IP module 440 of this size in orderfor the modules to configure and possibly optimize the transmitcharacteristics, by configuring the send buffer size. Ironically for theTCP module 430, this strategy has no effect in optimizing the streamhead module 420 behavior, due to the fact that the user buffer is brokenup into maximum segment size (MSS) chunks that the TCP module 430 canhandle.

[0027] For example, a 1 MB user buffer written to the socket causes over700 kernel buffer allocations in the typical 1460-bytes MSS case,regardless of the size. This method is quite inefficient, not onlybecause of the costs incurred per allocation, but also because theapplication data written to the socket cannot be kept in largercontiguous chunks.

[0028] In the prior art systems shown in FIGS. 1-4, a socket's packetprocessing consists of the header 420, the transport module 430, thenetwork module 440 and the driver 450. Application data residing in thekernel buffers are sent down through each module's queue via a STREAMSframework. The framework determines the destination queue for themessage, hence providing a sense of abstraction between the modules.

[0029] In the system shown in FIG. 4, data is a contiguous block ofmemory which is divided into small chunks of data that could betransmitted to a link partner and re-assembled to reproduce a copy ofthe original data. The number of times that the data packet is dividedup depends on how many layers the data goes through. Each layer throughwhich the data is transmitted adds a header to the chunk to facilitatethe reception and re-assembly on the link partner. The sub-division ofthe data and appending headers for each layer can become costly whendata gets to the data link provider interface (DLPI) layer. The DLPIlayer is only designed to send one packet at a time. If the originaldata block is left intact and the headers are built on a second page, itmay be possible to give the hardware two blocks of memory, header memoryand a payload memory. However, assembling the data chunks can stillprove to be costly.

[0030] One prior art solution to the large processing overhead cost ofhandling bulk data transmission is the implementation of a hardwarelarge send offload feature. The large send offload is a hardware featureimplemented by prior art Ethernet cards that virtualize the link maximumtransmission unit, typically up to 64 KB from the network stack. Thisenables the TCP/IP modules to reduce per-packet costs by the increasedvirtual packet size. Upon receiving the jumbo packet from the networkingstack, the NIC driver instructs the on-board firmware to divide the TCPpayload into smaller segments (packets) whose sizes are based on thereal TCP MSS (typically 1460 bytes). Each of these segments of data isthen transmitted along with the TCP/IP header created by the firmware,based on the TCP/IP header of the jumbo packet as shown in FIG. 5.

[0031] Although this prior art solution substantially reduces theper-packet transmission costs, it does not provide a practical solutionbecause this solution is exclusively tailored for TCP and depends on thefirmware's ability to correctly parse and generate the TCP/IP headers(including IP and TCP options). Additionally, due to the virtual size ofthe packets, many protocols and/or technologies which operate on thereal headers and payload, e.g., IPsec will cease to function. It alsobreaks the TCP processes by luring the TCP module 430 into using largermaximum transmission unit (MTU) compared to the actual link MTU. Sincethe connection endpoints have a different notion of the TCP MSS, itinadvertently brings harm to the congestion control processes used byTCP. Doing so would introduce unwanted behavior, such as high rate ofretransmissions caused by packet drops.

[0032] The packet chaining data transmissions of the prior art systemtherefore require data to be transmitted through the network subsystemin small packets. Also required are the creation of individual headersto go with each packet that requires the sub-layers of the networksubsystem to transmit pieces of the same data, due to the fixed packetsizes, from a source to a destination host. Such transmission of datapackets is not only time consuming and cumbersome, but very costly andinefficient. Supporting protocols other than TCP over plain IP wouldrequire changes made to the firmware which in itself is alreadycomplicated and poses a challenge for rapid software development/testcycles. Furthermore, full conformance to the TCP protocol demands thatsome fundamental changes to operating system networking stackimplementation, where a concept of virtual and real link MTU is needed.

SUMMARY OF INVENTION

[0033] Accordingly, to take advantage of the many application programsavailable and the increasing number of new applications being developedand the requirement of these new applications for fast networkbandwidth, a system is needed that optimizes data transmission through akernel network subsystem. Further, a need exists for solutions to allowfor the multi-packet transfer of data in a computer system withoutincurring the costly delay of transmitting each piece of data with anassociated header information appended to the data before transmittingthe data. A need further exists for an improved and less costly methodof transmitting data without the inherent prior art problems ofstreaming individual data packet headers with each data transmitted inthe network subsystem. A need further exists for a data link providerinterface layer extension primitive that is flexible and scalable tosend batches of data packets or split header and payload packets fortransmission to requesting network devices.

[0034] What is described herein is a computer system having a kernelnetwork subsystem that provides a mechanism and a technique forproviding a multipacket data transfer from applications to the networksubsystem of the kernel without breaking down the data into small datapackets. Embodiments of the present invention allow programmers tooptimize data flow through the kernel's network subsystem on the maindata path connection between the transport connection protocol and theInternet protocol suites of the kernel.

[0035] Embodiments of the present invention allow multi-packet datasizes to be dynamically set in order to avoid a breakdown of applicationdata into small sizes prior to being transmitted through the networksubsystem. In one embodiment of the present invention, the computersystem includes a kernel transport layer transmit interface system thatincludes optimization logic for enabling code that enables kernelmodules to transmit multiple packets in a single block of applicationdata using a bulk transfer of such packets without repetitive send andresend operations. In one embodiment, the present invention enablesheader information from the multiple packets of data to be separatedfrom the corresponding payload information for transmission to arequesting network device.

[0036] The multi-packet transmit logic further provides a programmerwith a number of semantics that may be applied to the extension dataalong with the manipulation interfaces that interact with the data. Thetransport layer transmit interface logic system of the present inventionfurther allows the data packetizing to be implemented dynamicallyaccording to the data transfer parameters of the underlying kernelapplication program.

[0037] Embodiments of the present invention further include packetinformation logic that processes information required to access headerand payload data in each packet block. The packet information logicincludes offsets and packet length information which may be used inconjunction with header area base address and payload area base addressinformation that is required to load a request to the network device.

[0038] Embodiments of the present invention also include packet offloadinformation logic that provides offload types and corresponding offsetsthat are implemented as pointers to the next offload in the multi-packetdata block. The packet offload information comprises offload offsetinformation that enables offloads from the packets to be multi-threadedfor transmission. These offloads also allow for one's complementchecksumming, internet protocol checksumming, etc., of the multi-packetdata block.

[0039] Embodiments of the present invention further include layer 2addressing logic. The layer 2 addressing logic allows the multi-packettransmission unit of the present invention to transmit the header andpayload information of the multi-packet data block as layer 2 addressingpackets. The format of the layer 2 address is given by the DLPIspecification which allows the layer 2 address to apply to all thepackets in the multi-packet data block in a particular request.

[0040] Embodiments of the present invention further include data linkinglogic for linking the header and segment data buffers together to definethe single data block representing an array of packets to be transmittedeach transmission cycle.

[0041] These and other objects and advantages of the present inventionwill no doubt become obvious to those of ordinary skill in the art afterhaving read the following detailed description of the preferredembodiments which are illustrated in the various drawing figures.

BRIEF DESCRIPTION OF THE DRAWINGS

[0042] The accompanying drawings, which are incorporated in and form apart of this specification, illustrate embodiments of the invention and,together with the description, serve to explain the principles of theinvention:

[0043]FIG. 1 is a block diagram of a prior art computer system;

[0044]FIG. 2 is a block diagram of software layers of a prior art kernelsubsystem;

[0045]FIG. 3 is a block diagram of software layers of a networksubsystem of a prior art kernel;

[0046]FIG. 4 is a block diagram of software layers of a prior artnetwork module of a prior art kernel;

[0047]FIG. 5 is a block diagram of a prior art packet handling betweenthe TCP and IP modules of FIG. 4;

[0048]FIG. 6 is a block diagram of a computer system of one embodimentof the present invention;

[0049]FIG. 7 is a block diagram of an exemplary network subsystem withan embodiment of the multi-data transmitter of the kernel subsystem inaccordance an embodiment of the present invention;

[0050]FIG. 8 is a block diagram packet organization of one embodiment ofthe present invention; and

[0051]FIG. 9 is a flow diagram of a method of a multi-packettransmission through the network layer of the kernel subsystem of oneembodiment of the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0052] Reference will now be made in detail to the preferred embodimentsof the invention, examples of which are illustrated in the accompanyingdrawings. While the invention will be described in conjunction with thepreferred embodiments, it will be understood that they are not intendedto limit the invention to these embodiments.

[0053] On the contrary, the invention is intended to cover alternatives,modifications and equivalents, which may be included within the spiritand scope of the invention as defined by the appended Claims.Furthermore, in the following detailed description of the presentinvention, numerous specific details are set forth in order to provide athorough understanding of the present invention. However, it will beobvious to one of ordinary skill in the art that the present inventionmay be practiced without these specific details. In other instances,well-known methods, procedures, components, and circuits have not beendescribed in detail as not to unnecessarily obscure aspects of thepresent invention.

[0054] The embodiments of the invention are directed to a system, anarchitecture, subsystem and method to process multiple data packets in acomputer system that may be applicable to an operating system kernel. Inaccordance with an aspect of the invention, a multi-packet datatransmission optimization system provides a programmer the ability todynamically transmit multiple packets of application program data in asingle bulk transmission in the transport layer of the kernel from acomputer applications program over a computer network to a host device.

[0055]FIG. 6 is a block diagram illustration of one embodiment of acomputer system 600 of the present invention. The computer system 600according to the present invention is connected to an external storagedevice 680 and to a network interface drive device 620 through whichcomputer programs according to the present invention can be loaded intocomputer system 600. External storage device 680 and drive device 620are connected to the computer system 600 through respective bus lines.Computer system 600 further includes main memory 630 and processor 610.Drive 620 can be a computer program product reader such a floppy diskdrive, an optical scanner, a CD-ROM device, etc.

[0056]FIG. 6 additionally shows memory 630 including a kernel levelmemory 640. Memory 630 can be virtual memory which is mapped ontophysical memory including RAM or a hard drive, for example, withoutlimitation. During process execution, data structures may be programmedin the memory at the kernel level memory 640. According to the presentinvention, the kernel memory level includes a multi-data transmissionmodule (MDT) 700. The MDT 700 enables a programmer to optimize datapacket flow through the transport layer of the network subsystem of thekernel 640.

[0057]FIG. 7 is an exemplary block diagram illustration of oneembodiment of the network subsystem with the MDT 700 of the kernelmemory space of the present invention. The exemplary kernel memory spacecomprises MDT 700, kernel data generation module 710, transport module720, network module 730 and device driver 740. The data generationmodule 710 provides the STREAM configuration for the present invention.The data generation module 710 generates multiple segments of datarepresenting a single block of application data in response tomulti-data transmit requests from the transport module.

[0058] The transport module 720 optimizes the performance of the maindata path for an established connection for a particular applicationprogram. This optimization is achieved in part by the network module 730knowledge of the transport module 720, which permits the network module730 to deliver inbound data blocks to the correct transport instance andto compute checksums on behalf of the transport module 720.Additionally, the transport module 720 includes logic that enables it tosubstantially reduce the number of acknowledgment overheads in each datablock processed in the network sub-system. In one embodiment of thepresent invention, the transport module 720 creates a singleconsolidated transport and network headers for multiple outgoing packetsbefore sending the packets to the network module 730.

[0059] The network module 730 is designed around its job as a packetforwarder. The main data path through the network module 730 has alsobeen highly optimized for both inbound and outbound data blocks toacknowledge and filly resolve addresses to ports the transport layerprotocols have registered with the network module 730.

[0060] The network module 730 computes all checksums for inbound datablocks transmitted through the network sub-system. This includes notonly the network header checksum, but also, in the transport cases. Inone embodiment of the present invention, the network module 730 knowsenough about the transport module 720 headers to access the checksumfields in their headers. The transport module 720 initializes headers insuch a way that the network module 730 can efficiently compute thechecksums on their behalf.

[0061] The multi-data transmitter 700 provides an extensible,packet-oriented and protocol-independent mechanism for reducing theper-packet transmission over-head associated with the transmission oflarge chunks of data in the kernel's network subsystem. In oneembodiment of the present invention, the MDT 700 enables the underlyingnetwork device driver to amortize the input/output memory managementunit (IOMMU) related overhead across a number of data packetstransmitted in the kernel.

[0062] By reducing the overhead cost, the device driver needs to onlyperform the necessary IOMMU operations on two contiguous memory blocksrepresenting the header information associated with the transmittedblock of data comprising multiple packets of data. In one embodiment ofthe present invention, the MDT 700 with the assistance of the kernel'snetworking stack performs only the necessary IOMMU operations on the twocontiguous memory blocks representing the header buffer and the datapayload buffer during each transmit call to the transport module 720.

[0063] The MDT 700 achieves this by instructing the data generationmodule 710 to copy larger chunks of the application data into thekernel's buffer. In one embodiment of the present invention, the MDT 700avoids having dependencies on the underlying network hardware orfirmware. The MDT 700 further avoids changing the data generationframework of the data generation module 710 to minimize the potentialimpact on the stability and performance of the underlying operatingsystem. The MDT 700 advantageously provides a mechanism to increasenetwork application throughput and achieve a better utilization of thehost computer's CPU without having to modify the underlying operatingsystem.

[0064]FIG. 8 is a block diagram illustration of one embodiment of a datarequest of one embodiment of the multi-packet transmission unit 700 ofthe present invention. As shown in FIG. 8, a multiple data requestcomprises a data request primative structure 800, header page 820 andpayload page 830.

[0065] The data request primitive 800 contains all the informationrequired to transmit headers and payload information as regular layer 2packets. The data primative 800 is a data structure that is passed bythe MDT 700 to the data link device that utilizes multi-datatransmission capability. The data link driver uses the informationprovided in the data structure to inform the hardware of the locationand length of headers and payloads with the multi-data buffers. Thisallows the hardware to piece together the data packets into Ethernetpackets to the physical link.

[0066] The data request primative 800 comprises header offsetinformation 801, an array of per packet information 802-809, optionalstack headers 810, per packet offload information 811 and layer 2addressing 812. Associated with the multiple packet primative 800 isheader page 820 and payload page 830.

[0067] In one embodiment of the present invention, the multi-packet datatransmission unit 700 comprises of one main memory region. In thisembodiment, the headers are implemented within the memory regioncontaining the multi-packet data primative 800. Furthermore, the headerscan be packets accumulated for one multi-packet call in which case thepayload section 820 is optional. In another embodiment, the payloadsection 820 is not an optional implementation. And in this case, groupsof packets are packed into the payload 820 for transmission in a singletransmission cycle.

[0068] The header page 820 originates in the top layer of the kernel'snetworking subsystem protocol stack and it is used to build the packetheaders for a transmission. The implementation of the packet headers inthe header page 820 is protocol development dependent. This allows theprotocol developer to determine how to build the packets in the headerpage 820. The payload page 830 is fragmented as per each layer'srequirements and the headers maintain that information throughout thetransmission cycle.

[0069] The payload 830 has two origins. One is at the application layerand the other is within the kernel. The payload 830 that originates atthe application layer ends up in the kernel due to an IOMMU operationmaking it visible to the kernel. Alternatively, a second in kernelpayload page is created and the original is copied into it. Once in thekernel space the payload is broken into link layer packet sizes andtransmitted. Alternatively, the payload 830 may be transmitted as acollection of large packets in which the headers 820 are incorporated aspart of the payload 830 section. In this embodiment, the payload isassociated with a multi-packet primative 800.

[0070]FIG. 9 is a flow diagram of one embodiment of a computerimplemented multi-data transmit packet processing of the presentinvention. As illustrated in FIG. 9, a multi-data transmit processingcommences 900 when the MDT 700 retrieves a packet count in a multi-datarequest presented to the MDT 700 at step 901. At step 902, the number oftransit descriptors required to complete a request is calculated fromthe packet header 820 and payload count 830.

[0071] At step 903 the MDT 700 determines whether there is enough spaceon the data link layer descriptor ring to handle the number of packetspresented in a particular request. If there is enough space on thedescriptor ring, processing continues at step 905. If the descriptorring does not have enough space to handle packets in a transmit request,processing continues at step 904 where the request is posted in astreams queue for a transmit retry.

[0072] At step 905, the MDT 700 determines whether a request ready to beposted on a descriptor ring has a header buffer in the packetinformation. If the header buffer exists, processing continues at step906 where the header buffer is locked downed for a direct memory access.If, on the other hand, the packet header buffer does not exist,processing continues at step 907 where the packet is checked todetermine whether a payload buffer is included in the packetinformation.

[0073] At step 908, if the packet information contains a payload buffer,the payload buffer is locked down for a direct memory access. At step909, the MDT 700 determines whether there is more packets to beprocessed from the request and if there are no more packets to beprocessed, the network is informed to service the packets just posted tothe descriptor ring at step 912.

[0074] At step 910 the processed packet with the header and payloadbuffers is posted in the descriptor ring to be transmitted. At step 911,the next packet in the transmit request is processed and the MDT 700advances to the next packet on the multi-data descriptor.

[0075] The foregoing descriptions of specific embodiments of the presentinvention have been presented for purposes of illustration anddescription. They are not intended to be exhaustive or to limit theinvention to the precise forms disclosed, and obviously manymodifications and variations are possible in light of the aboveteaching. The embodiments were chosen and described in order to bestexplain the principles of the invention and its practical application,to thereby enable others skilled in the art to best utilize theinvention and various embodiments with various modifications are suitedto the particular use contemplated. It is intended that the scope of theinvention be defined by the Claims appended hereto and theirequivalents.

1. A computer network system, comprising: a processor; a plurality ofnetwork devices; a device driver; an operating system kernel comprisinga multi-packet transmission system for transmitting multiple datapackets in a single transmission cycle to a network device coupled tothe computer system in a single system call to a data link layer in saidoperating system kernel.
 2. The computer system of claim 1, wherein saidmulti-packet transmission system comprises an array of packetinformation logic modules for storing information required to accessheader and payload information in each packet.
 3. The computer system ofclaim 2, wherein said multi-packet transmission system further comprisesa packet stack header logic module.
 4. The computer system of claim 3,wherein said multi-packet transmission system further comprises a packetoffload logic module for defining device offload types for themulti-packet transmission system.
 5. The computer network system ofclaim 4, wherein said multi-packet transmission system further comprisesa layer 2 addressing logic module for transmitting said multiple packetdata as layer 2 addressing packets.
 6. The computer network system ofclaim 2, wherein said packet information logic modules comprise headerlength information for defining space to set aside in main memory fromthe beginning of a current packet header to the beginning of the currentlayers header to enable the multi-packet transmission system to traversea protocol stack in said network system.
 7. The computer network systemof claim 6, wherein said packet information logic modules furthercomprise packet tail length information for defining space set asidesaid main memory, from the end of a current layer header to the end ofthe space set aside for the current packet header.
 8. The computernetwork system of claim 7, wherein said packet information logic modulesfurther comprise packet header offset information for defining headeroffset information relative to the base of the header buffer forcalculating header input/output address posted to said network devicefor the current packet to be fetched by said network device andtransmitted.
 9. The computer network system of claim 8, wherein saidpacket information logic modules further comprise packet header lengthinformation for defining a total length of all headers for a particularpacket.
 10. The computer network system of claim 9, wherein said packetinformation logic modules further comprise offload offset informationfor defining a starting offset for a list of per packet offloads. 11.The computer network system of claim 4, wherein said packet offloadlogic modules comprise an offload type for defining a parameter type ofan offload a driver and a hardware device is expected to provide to saidmulti-packet transmission system.
 12. An operating system kernel in acomputer network, comprising: a network subsystem; a plurality ofnetwork devices; a transport module for processing a multi-packet datablock in a single transport cycle; and a multi-packet transmissionmodule for transmitting said multi-packet data block as a single datatransmission block in a single system call from said operating systemkernel to a requesting one of said plurality of network devices.
 13. Theoperating system kernel of claim 12, wherein said multi-packettransmission module comprises a contiguous block of a plurality ofheader information with an associating payload data information.
 14. Theoperating system kernel of claim 13, wherein said multi-packettransmission module further comprises a packet offload logic module fordefining device offload types for the multi-packet transmission system.15. The operating system kernel of claim 14, wherein said multi-packettransmission module further comprises header length information fordefining space to set aside in main memory from the beginning of acurrent packet header to the beginning of the current layers header toenable the multi-packet transmission system to traverse a protocol stackin said network system.
 16. The operating system kernel of claim 15,wherein said packet information logic module further comprises packettail length information for defining a space set aside in said mainmemory from the end of a current layer header to the end of the spaceset aside for the current packet header.
 17. The operating system kernelof claim 16, wherein said packet information logic module furthercomprises packet header offset information for defining header offsetinformation relative to the base of the header buffer for calculatingheader input/output address posted to said network device for thecurrent packet to be fetched by said network device and transmitted. 18.The operating system kernel of claim 17, wherein said packet informationlogic module further comprises packet header length information fordefining a total length of all headers for a particular packet.
 19. Theoperating system kernel of claim 18, wherein said packet informationlogic module further comprises offload offset information for defining astarting offset for a list of per packet offloads.
 20. The operatingsystem kernel of claim 19, wherein said packet offload logic modulecomprises an offload type for defining the parameter type of an offloadthat a driver and a hardware device is expected to provide to saidmulti-packet transmission system.
 21. A computer implemented multi-datarequest transmission system comprising: data request primitive structurelogic comprising information for transmitting header and payloadinformation associated with a multi-packet data transmission requestinto layer 2 addressing packets; payload data logic for providingpayload information associated with each packet in said multi-packetdata transmission request; and packet header information logic forproviding header information associated with each of said multi-packetdata transmission requests.
 22. A system as described in claim 21wherein said multi-data request transmission system further comprises adata buffer for storing a plurality of packets of data transmitted in asingle transmission cycle to said network devices.
 23. A system asdescribed in claim 22 wherein said data is a kernel data structure of acomputer operating system.
 24. A system as described in claim 23 whereinapplication programs in said computer system are aware of said databuffer for said data structure.
 25. A method of transmitting a multiplepackets in a computer system network to a network device in a singlecall to a data link layer in said computer network, comprising:allocating a master block for a data payload; allocating headerinformation and associated block data; generating header-payload pairsby linking said header information to a payload data; allocating amulti-data transmit descriptor and an associating memory block;generating a contiguous data block corresponding to said multiplepackets; and transmitting said multi-packet data block to said devicedriver.
 26. The method of claim 25, wherein said transmitting saidmulti-packet data block comprises generating duplicate network protocolstack messages to said network device.
 27. The method of claim 26,wherein said transmitting said multi-packet data block further compriseslocking down said header and payload information for a direct memoryaccess in said computer network.
 28. The method of claim 27, whereinsaid transmitting said multi-packet data block further comprisesupdating said network protocol stack with said header-payload pairsuntil said multi-data request is completely transmitted.